Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735

Merged
merged 12 commits into from
Nov 26, 2024

Conversation

conroy-cheers
Copy link
Contributor

NVIDIA Jetson devices do not support the NVML protocol.

This PR adds a discrete CudaJetsonPlatform platform which does not rely on NVML, and is automatically used
on Jetson devices when NVML cannot be initialized. The Jetson platform supports all CUDA functionality, but does
not support NVLink; platform-specific checks are adjusted where appropriate to account for this.

Jetson Orin devices use CUDA capability 8.7 which is mostly identical to 8.6; capability 8.7 has been added to definitions
where 8.6 is used.

This has been built and tested on Jetson Orin AGX.

Fixes #9728

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@youkaichao
Copy link
Member

I don't think you need to create another platform class.

if Jetson does not support nvlink, you can just detect Jetson and return false for the nvlink detection.

@youkaichao
Copy link
Member

creating a platform just for Jetson is too overkill.

@conroy-cheers
Copy link
Contributor Author

It's not just NVLink; currently that's only used for custom allreduce as far as I can tell.

The CUDA platform class assumes NVML is available throughout to avoid initialising a CUDA context, but Jetson doesn't have NVML, so a different approach needs to be taken.

I didn't want to pollute the CUDA platform by initializing a CUDA context, so created a different one.

Happy to rework if you prefer a different approach?

@conroy-cheers
Copy link
Contributor Author

Alternatively, could try to abstract away the NVML calls to make calls to an alternative backend on Jetson - jetson_stats might be usable, but not sure if all features will be implemented (especially if we want to use other NVML features in future).

@conroy-cheers
Copy link
Contributor Author

I believe Jetson is also currently the only CUDA-supporting device to use a unified memory configuration, so some optimisations may be possible there (e.g. handover of weights between CPU and GPU without copy).

However it could be argued to save the separate backend for when such optimisations are actually implemented.

@tlrmchlsmth
Copy link
Collaborator

Looking at the logs, adding 8.7 to the ARCHS is not affecting the binary size at all:

This PR:

[2024-10-27T13:36:44Z] #28 0.789 Wheel dist/vllm-0.6.3.post2.dev123+gda79e3e5.cu124-cp38-abi3-linux_x86_64.whl is within the allowed size (184.58 MB).

Latest main:

[2024-10-28T05:09:01Z] #28 0.878 Wheel dist/vllm-0.6.3.post2.dev125+g32176fee.cu124-cp38-abi3-linux_x86_64.whl is within the allowed size (184.58 MB).

Does this make sense?

@conroy-cheers
Copy link
Contributor Author

Possibly - as far as I know 8.6 and 8.7 are essentially compatible, so it's possible it's just the same kernels being linked twice?

I did have to add the 8.7 capability to get vLLM running on Orin though; without, it'd complain that there was no compatible kernel available.

@conroy-cheers conroy-cheers force-pushed the fix-nvml-jetson-support branch from 400c3d1 to ef7ff3c Compare October 29, 2024 02:58
@tlrmchlsmth
Copy link
Collaborator

Possibly - as far as I know 8.6 and 8.7 are essentially compatible, so it's possible it's just the same kernels being linked twice?

I did have to add the 8.7 capability to get vLLM running on Orin though; without, it'd complain that there was no compatible kernel available.

Ok, I didn't catch this before, but this PR isn't adding the sm87 kernels to the wheel file (just adds support for compiling them). IIUC, Jetson is ARM-only and the vLLM wheels are only built for x86, so all that looks good to me for this PR

Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it to @youkaichao to decide whether it makes sense to have Jetson in its own platform, but lgtm otherwise

@mergify mergify bot added the ci/build label Oct 29, 2024
@youkaichao
Copy link
Member

I can find https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/vllm who builds vllm on jetson. I don't see any manipulation like this.

@conroy-cheers
Copy link
Contributor Author

conroy-cheers commented Oct 29, 2024

I can find https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/vllm who builds vllm on jetson. I don't see any manipulation like this.

Just investigated this. NVIDIA JetPack 6 includes NVML support, so the alternate platform will not be needed on JetPack 6.

However, JetPack 5 and older still do not support NVML, and JetPack 6 is only releasing for the Orin series, not any older Jetson devices. So if we want to support JetPack 5 devices we will need to use the alternate non-NVML approach.

@conroy-cheers conroy-cheers force-pushed the fix-nvml-jetson-support branch 4 times, most recently from 51085cf to c93ecc4 Compare October 29, 2024 23:50
@conroy-cheers
Copy link
Contributor Author

conroy-cheers commented Nov 11, 2024

@youkaichao 抱歉,您能重新审核一下吗?如果能尽快合并就太好了。 谢谢

Can this branch build VLLM directly without relying on Jetson containers?

Yes, I am currently building this directly on Jetson outside of containers. Build process is fully specified by my nixpkgs branch; for Jetson support make sure to override the source hash to use this branch.

You are welcome to build it directly using Nix (I am running jetpack-nixos) or adapt to whatever build scripts you are using.

@conroy-cheers
Copy link
Contributor Author

Bump, still pending review

import os

def cuda_is_jetson() -> bool:
return os.path.isfile("/etc/nv_tegra_release") \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check with nvidia folks, how robust it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check came from this thread:
rapidsai/dask-cuda#400 (comment)

context.log_warnings()


class CudaPlatform(Platform):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have NvmlCudaPlatform and NonNvmlCudaPlatform inheriting from Platform, and in the end of this file, based on jetson or not, define a variable CudaPlatform to point to either NvmlCudaPlatform or NonNvmlCudaPlatform .

Copy link

mergify bot commented Nov 21, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @conroy-cheers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 21, 2024
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now! Thanks for the fix!

@youkaichao
Copy link
Member

@conroy-cheers can you please merge main to solve the conflict?

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto thanks for the responses!

@mergify mergify bot removed the needs-rebase label Nov 26, 2024
@conroy-cheers
Copy link
Contributor Author

conroy-cheers commented Nov 26, 2024

Merged main and retested on Jetson. Ready to go, assuming CI passes 👍

vllm/platforms/cuda.py Outdated Show resolved Hide resolved
@conroy-cheers conroy-cheers force-pushed the fix-nvml-jetson-support branch from 6644c52 to 040dc12 Compare November 26, 2024 06:05
@youkaichao youkaichao enabled auto-merge (squash) November 26, 2024 06:37
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 26, 2024
auto-merge was automatically disabled November 26, 2024 10:59

Head branch was pushed to by a user without write access

@conroy-cheers conroy-cheers force-pushed the fix-nvml-jetson-support branch from 040dc12 to 77daa0f Compare November 26, 2024 10:59
@conroy-cheers
Copy link
Contributor Author

CI seems to be failing, not sure if it's related to the changes here?

@mgoin mgoin mentioned this pull request Nov 26, 2024
@mgoin
Copy link
Member

mgoin commented Nov 26, 2024

@conroy-cheers failing test is unrelated I'll ask for a force merge, thanks for your work!

@youkaichao youkaichao merged commit f5792c7 into vllm-project:main Nov 26, 2024
69 of 71 checks passed
afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Nov 26, 2024
afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Jetson support regression
5 participants